IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART B: CYBERNETICS, VOL. XX, NO. X 



1 



Orthogonal Least Squares Algorithm for the 
Approximation of a Map and its Derivatives with a 

RBF Network 



Carlo Drioli and Davide Rocchesso 



O 

o 
o 

c 

00 
O 



> 
OS 

m 
o 
\o 
o 
o 
o 

o 



X 



Abstract — Radial Basis Function Networks (RBFNs) are 
used primarily to solve curve-fitting problems and for non- 
linear system modeling. Several algorithms are known for 
the approximation of a non-linear curve from a sparse data 
set by means of RBFNs. However, there are no procedures 
that permit to define constrains on the derivatives of the 
curve. In this paper, the Orthogonal Least Squares algo- 
rithm for the identification of RBFNs is modified to provide 
the approximation of a non-linear 1-in 1-out map along with 
its derivatives, given a set of training data. The interest on 
the derivatives of non-linear functions concerns many identi- 
fication and control tasks where the study of system stability 
and robustness is addressed. The effectiveness of the pro- 
posed algorithm is demonstrated by a study on the stability 
of a single loop feedback system. 

Keywords — Radial Basis Function Networks, OLS learn- 
ing, curve fitting, iterated map stability, nonlinear oscilla- 
tors. 



I. Introduction 

THE Orthogonal Least Squares (OLS) algorithm Q is 
one of the most efficient procedures for the training 
of Radial Basis Function Networks (RBFN). A RBFN is a 
two-layer neural network model especially suited for non- 
linear function approximation, and appreciated in the fields 
■of signal processing ^j, [0, non-linear system modeling, 
identification and control Jq], ||, and time-series pre- 
diction §,§. 

Despite of the fact that in many identification and con- 
trol tasks the stability of the identified system depends on 
the gradient of the map M , jnj , the problem of efficiently 
approximating a non-linear function along with its deriva- 
tives seems to be rarely addressed. In jn|, some the- 
oretical results as well as some application examples are 
found that apply to generic feedforward neural networks. 

In this paper, an extended version of the OLS algorithm 
for the training of I-in I-out RBFNs is proposed, which 
permits to approximate an unknown function by specify- 
ing a set of data points along with its desired first-order 
derivatives. 

The paper is organized as follows: in Section |l[ the OLS 
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algorithm is reviewed and modified to add control over the 
derivative of the function to be approximated. The ex- 
tension to higher order derivatives is introduced in Section 



III. Application examples in the field of single loop feed- 
back systems are given in Section [V. In Section [v|, the 
conclusions are presented. 

II. Orthogonal Least Squares Learning 
Algorithm 

The OLS learning algorithm is traditionally tied to the 
parametric identification of RBF networks, a special two- 
layer neural network model widely used for the interpola- 
tion and modeling of data in multidimensional space. In 
the following we will restrict the discussion to the 1-in 1- 
out RBFN model, which is a mapping / : R — > R of the 
form 



H 



f(x) = b + ^2 Wi4>(x, mi), 



(1) 



where x € M is the input variable, <f>(-) is a given non-linear 
function, b, Wi and m„ I < i < H , are the parameters, and 
H is the number of radial units. The RBFN can be viewed 
as a special case of the linear regression model 



H 



t(k) = b + ^2wiPi(k) + e(k), 



(2) 



where t(k) is the desired A:-th output sample, e(k) is the 
approximation error, andpi(fc) are the regressors, i.e. some 
fixed functions of x(k), where x{k) are the input values 
corresponding to the desired output values t(k): 



Pi(k) = ^{xik)^.,). 



(3) 



In its original version, the OLS algorithm is a proce- 
dure iteratively selects the best regressors (radial basis 
units) from a set of available regressors. This set is com- 
posed of a number of regressors equal to the number of 
available data, and each regressor is a radial unit cen- 
tered on a data point. The selection of radial unit cen- 
ters is recognized as the main problem in the paramet- 
ric identification of these models, while the choice of the 
non-linear function for the radial units does not seem to 
be critical. Although gaussian-shaped functions are often 
preferred, spline, multi-quadratic and cubic functions are 
valid alternatives. Here, we will use the cubic function 
(j)(x,m) — (\\x — m||) 3 , where || • || denotes the euclidean 
norm and m denotes the center of the radial unit. 
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A. Classic OLS algorithm 

Say {x(k),t(k)}, k — 1,2,.. .,N, is the data set given 
by TV input-output data pairs, which can be organized 
in two column vectors x = [x(l) • • • x(N)] T and t = 
[t(l) ■ ■ ■ t(N)] T . The model parameters are given in vectors 
m = [mi • • •m//] T , w = [w\ ■ ■ -wh] t and b = [b], where 
H is the number of radial units to be used. Arranging the 
problem in matrix form we have: 



t=[P 1] 



w 

b 



+ e 



(4) 



with 



P = [ Pi • • • Ph 

</>0(l),TOl) 

0(x(JV),mi) 



<j>(x(l),m H ) 
4>{x{N),m H ) 



(5) 



where Pi = \4>{x{l), mi) . . . <j>(x(N), m,i)] T are regressor 
vectors forming a set of basis vectors, e = [e(l) • • • e(N)] T 
is the identification error, and 1 = [1 . . . 1] T is a unit col- 
umn vector of length TV. The least squares solution of this 
problem satisfies the condition that 



t=[P 1 ] 



w 

b 



(6) 



is the projection of t in a vector space spanned by the 
regressors. If the regressors are not independent, the con- 
tribution of each regressor to the total energy of the desired 
output vector is not clear. The OLS algorithm iteratively 
selects the best regressors from a set by applying a Gram- 
Schmidt orthogonalization, so that the contribution of each 
vector of this new orthogonal base can be determined in- 
dividually among the available regressors. 

B. Modified OLS algorithm 

The classic algorithm selects the best set of regressors 
from the ones available, and determines the output layer 
weights for the identification of the desired in-out map, 
but docs not explicitly controls the derivative of the func- 
tion. Wc propose to modify this procedure so to permit 
to specify the desired value of the function derivative in 
each data point. The data set will then be organized in 
three vectors x = [x(l) ■ ■ ■ x(N)] T , t = [t(l) • • • t(N)] T , and 
= [ti(l) • • • ti(N)] T , x and t being the input-output 
pairs and being the respective derivatives. It has to be 
noted that the original OLS algorithm selects each radial 
unit from a set of units, each of which is centered on a input 
data point. The maximum number of units is then limited 
to the number of data points. When we add requirements 
on the derivative of the function, a further constraint to the 
optimization problem is added, and the number of units to 
be selected in order to reach the desired approximation may 
be higher then the number of data points. A possible choice 
is to augment the input vector with points where there is 
no data available, and to build the set of N e regressors on 
this extended vector. 



The algorithm can be summarized as follows: 
• First step, initialization: the set of regressors for se- 
lection is obtained by centering the N e radial units, and 
the error reduction ratio (err) for each regressor vector is 
computed. Given the regressor vectors 

Pi = [4>{x{l),x(i)),...,<f>{x{N),x(i))] T , l<i<N e , (7) 

and defined the first-iteration vectors 

ui,i = Pi, l<i< N e . (8) 

The error reduction ratio associated with the i-th vector is 
given by 



err M = « ! t) 2 /((u^u 1 , ! )(t J t)). 



(9) 



In a similar way, the regressor vectors for the derivative of 
the map are computed: 



(1) _ ^{x(l),x(i)) d<t>(x(N),x(i)) 



p = 



dx 



dx 



f, l<i<N e 



(10) 



and the first-iteration vectors are defined: 

lM = pf\ !<*<N e . (11) 

The error reduction ratio for the derivative is: 

grad_err M = a^jVCa^.O^V))), (12) 

1 < i < N e . 

The erri.i and grad.errj i represent the error reduction 
ratios caused respectively by ui,i and li^, and the total 
error reduction ratio can be computed by 

tot-erri^ = A erri^ + (1 — A) grad_err x i , (13) 

where A weights the importance of the map against its 
derivative. The index i\ is then found, so that: 

tot.erri^j = maxjtot.erri^, 1 < i < N e }. (14) 

i 

The regressor giving the largest error reduction ratio is 
selected and removed from the set of available regressors. 
The corresponding center is added to the set of selected 
centers: 



ui = u Ml = p n ; 



li — h,h 



p (1) - 



mi = x{i\). 



(15) 



(16) 



(17) 



• /i-th iteration, for h = l,...,H and H < N e : the 

regressors selected in the previous steps, having indexes 
ii, ih-i, have been removed from the set of available re- 
gressors. Before computing the error reduction ratio for 
each regressor still available, the orthogonalization step is 
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performed which makes each regressor orthogonal with re- 
spect to those already selected: 



h-l 



Uh,i = Pi - ^(ujpj)/(ujuj)uj, i ^ ii,« 2 , —,ih-i; 



(18) 



h-l 

h,i = P.[ 1) -^(lJpfVaJl.)!., i^ii,ia,...,ih-ii 
j'=i 



(19) 



err h < = (u£ ^) /((uLu/,,i)(t t)), i ^ i x ,h, ..., ih-i\ 



(20) 



grad.err M = (l^ t ( 1 )) 2 /((l^l /M )(t( 1 ) t«)), (21) 

i 7^ ii,i2, ...,ih-i; 



tot_err M = Aerr, M + (1 - A) grad.err^, 
i 7^ h,i2, —,ih-i- 



(22) 



As before, the regressor with maximum error reduction ra- 
tio is selected and removed from the list of availability, and 
its center is added to the set of selected centers: 

tot_err ft . i(i = max{ tot.err^i, i^i x ,i 2 , (23) 

i 

(24) 



u/j = Uh,i h ; 
m h = x(i h ). 



(25) 
(26) 



« Final step, computation of output layer weights : once 
the H radial units have been positioned, the remaining 
w and b parameters can be found with a Moore-Pcnrosc 
matrix inversion: let us call Ph — [uiUa • • • Uj] and = 
[I1I2 ■ • ■ Ih) the two sets of selected regressors, and let 1 = 
[1 . . . 1] T and = [0 . . . 0] T be two column vectors of length 
N. Then we have 



t 



whose solution is 



w 

b 



H 
H 



H 

>(D 

H 



1 





1 





w 

b 



t 



(27) 



(28) 



Usually, it is convenient to stop the procedure before the 
maximum number of radial units has been reached, as soon 
as the identification error is considered to be acceptable. To 
this purpose, one can use equation ( p8|) a t iteration h to 
compute the identification error in (]27|) []. 

^^Note that in this case the length of vector w and the number of 
columns of matrices P in equation (pgj) is h instead of H 



C. Example 

Let us consider, as an example, the fitting of a step-like 
data set, where the derivative is arbitrarily constrained. 
The data set is shown in Fig. [I], along with the result 
of the parametric identification routine. It can be seen 
how an unlikely derivative was chosen in the critical zone 
to highlight the properties of the model. Fig. ^| shows 
the interpolating properties of the resulting RBF Network 
when computed on an input interval which is denser than 
the original input data set. 

Function Approximation (40 Data Points, 72 Radial Units) 
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Fig. 1 

Result of the training procedure applied to the fitting of a 
step-like data set ( + , upper figure), with arbitrary 

DERIVATIVE CONSTRAINT (+, MIDDLE FIGURE). In UPPER AND 
MIDDLE FIGURES, + IS THE DESIRED OUTPUT AND THE CONTINUOUS 
LINE IS THE ACTUAL OUTPUT. THE PROBLEM REQUIRED 72 RADIAL 
UNITS TO FIT 40 DATA POINTS, WITH AN IDENTIFICATION ERROR LESS 
THAN 10 -9 IN MAGNITUDE. 



III. Higher order derivatives 

The extension of the algorithm for the identification of a 
map and its derivatives of order higher than one is straight- 
forward. Given that <f> is continuous and has continuous 
derivatives up to order r, then the derivatives of order up to 
r can be identified for the map /. The data set is organized 



+ 1 vectors x = [a:(l) • ■ ■ x(N)] J 



[t(i)-OT 



t« = [t x (l) ■ ■ ■ h(N)] T , tM = Ml) • • • U(N)] T , where 
t^ l \k) is the desired i-th derivative for the /c-th data point. 
In the first step, a different set of regressors can be com- 
puted for each derivative order: 



p, = [^(x(l), x(i)), . . . , <f>(x(N), x(i))] T , !<*<N e 



{d) _ d d cb(x(l),x(i)) 

Pl [ dx d 



d d (/)(x(N),x{i)) T 
dx d ' 
Ki< N„, Kd<r. 



(29) 



(30) 
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RBFN Generalization Properties 



Input 



corresponding centers are added to the set of selected cen- 
ters: 



tot_err M(i = max{ tot.err^, i ^ ii,i 2 , ...,ih-i}; (37) 



(38) 





1 : 

ir N " " " ,: 







5 10 15 20 25 30 35 40 

Input 

Fig. 2 

Interpolation properties of the identified RBF Network: 

THE OUTPUT OF THE MODEL WAS COMPUTED ON A INPUT SET WHICH 
IS DENSER THAN THE ORIGINAL ONE (X: DATA SET). 



^ ■■■ l> r ^ the orthogonalized re- 
gressor vectors selected in the (h — l)-th iteration, in the 



If we now call u Jh _ 1 , l ifc _ x , , 



be computed as the weighted sum of these terms: 



h-l 



Uh,i = Pi - y^(ujpj)/(ujuj)uj, i ^ -,ih-i; 



(31) 



err fcii = (uLt) /((u h i u h)i )(t t)), i ^ i x ,i 2 , ...,ih-f, 



(32) 



ig = p«> - ]T(lJpf )/(lJl i )l j! i * ix.ia, 
3=1 

(33) 

errg = (lg T tW) 2 /((lS T lS)(tW r t^))), 

i^i 1 ,i 2 ,...,ih-i; (34) 



tot.err/j^ = Aoerr/^ + A d errg 



(35) 
(36) 



The regressors with maximum error reduction ratio are se- 
lected and removed from the list of availability, and the 



Jd) _ Jd) 

l h — l h,i h 



U d ^ r; 



m h = x(i h ). 



(39) 



(40) 



If we now let 



Ph 

P (i) 

r H 



H 



[Ui • • • u ff j 

iWl 



(41) 



[i 



(r) 
1 



be the final set of orthogonal regressors obtained from the 
selection procedure, we can compute the output layer pa- 
rameters by solving the matrix equation 



in equa- 
can then 


t 




r p H 

P (i) 


1 " 









p(r) 
L r fl 


. 



w 

b 



(42) 



IV. Application Examples 

The OLS algorithm for the identification of a map and its 
derivatives with RBF networks is demonstrated using some 
examples from the field of feedback non- linear systems. 

A. Single loop feedback system and the Hopf bifurcation 
Theorem 

The single loop feedback circuit depicted in Fig. || is an 
example of autonomous non-linear system capable of dif- 
ferent dynamical behaviors, such as decaying oscillation, 
stable periodic motion (including constant), and chaos. We 
will consider the case where G(z) is made of two cascaded 
linear elements, i.e. a delay line Dl(z) of given length L, 
and a low-pass filter H(z). The function / is assumed to 
be a three-fixed points smooth function crossing the origin 
with slope Si, and having slopes S 2 and S3 = S 2 in the 
other two points (see Fig. ^-a). The topology of fig. || is 
of particular interest in the field of sound synthesis, for the 
physically inspired modeling of musical instruments with 
sustained sound GjJ], and has been object of investi- 
gation by the authors for the construction of generalized 
musical tone generators The length of the delay line, 
which can be seen as the medium where sound propagates 
(such as a flute pipe or a violin string) , is inversely propor- 
tional to the pitch of the signal generated, and represents 
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fO) 



G(z) 



Fig. 3 

Single loop feedback system 



an example of sound control parameter with a clear phys- 
ical meaning. The shape of the non-linear map and its 
fixed-point derivatives are recognized to be responsible for 
the stability of periodic motion, for the spectral content of 
the signal, and for the time-constant of transient extinc- 
tion. We don't care here about the shape of the map, and 
we focus on the fixed points and their derivatives. The 
condition for instability of the fixed point in the origin, 
and thus the condition for the system to oscillate, can be 
stated in terms of the Nyquist plot of the open loop trans- 
fer function G(z) — Dl(z)H(z). Say that — q + jO denotes 
the leftmost intersection point of the Nyquist plot of G(z) 
with the real axis. In order to let the system oscillate, a 
necessary condition for Si is Si < —1/q (l4|. A different 
role is assumed for the slope S2, which is responsible for 
limiting the growth of the system state. To this purpose, a 
slope S2 > — 1 is needed at some distance from the origin, 
in correspondence of the other two fixed points. 

As a practical example, a low-pass filter H{z) — 0.4 + 
0.3z _1 and a delay length L = 100 samples are consid- 
ered. The Nyquist plot of G(z) has the smallest intersec- 
tion point with the real axis in —0.7 + jO which gives a 
maximum slope of -1.4286 over which the oscillation will 
not occur. The length L = 100 for the delay line gives a 
period length T p = 200, which corresponds to a pitch of 
110.25 Hz at a sample rate of 22050 Hz. In Fig. |, the 
time evolution of the system is shown for a map / with 
fixed points (0,0), (-100,100), (100,-100), for different 
values of the slope Si , and for a random initial state in the 
range [—0.1,0.1]. It can be seen that the slope Si can be 
used to drive the system to a periodic steady state and to 
control the transient velocity. 

The example shown is a particular case of the more gen- 
eral Hopf bifurcation theorem fl6[| in its frequency domain 
formulation. It is interesting to point out that the single 
loop feedback systems exemplified in E^] are discretized 
versions of simple RLC electrical circuits, with at least a 
non-linear component (e.g., a tunnel diode). The result is 
a feedback scheme as the one in Fig. where G(z) is a 




2000 4000 6000 8000 10000 
t (samples) 





2000 4000 6000 8000 10000 
t (samples) 



2000 4000 6000 8000 10000 
1 (samples) 



Fig. 4 

Simulation of the circuit for different values of the 
derivative of the fixed point in the origin. a) shapes of the 
function /(■) for s\ = —1.4 (dotted line) , s\ = —1.65 (dashed 

LINE) AND Si = —2 (DASHDOTTED LINE) . B,C,d) TIME EVOLUTION 
FROM RANDOM INITIAL CONDITIONS IN THE THREE CASES. THE SLOPE 
Si = —0.1 OF THE OTHER TWO FIXED POINTS IS HELD CONSTANT IN 
THE THREE CASES, AND LIMITS THE GROWTH OF THE SYSTEM 
EVOLUTION. 



second-order II R transfer function, and no delay lines are 
considered in the loop. In these circuits, a stable almost 
sinusoidal oscillation is reached, whose frequency and am- 
plitude are functions of the second and third derivatives of 
the non- linear map /, evaluated in the equilibrium point 
(i.e., dc operating point), which is solution of the equation 
G(0)f(y)-y = 0. 



B. Stability control in feedback systems 

Still referring to the closed loop feedback system of Fig 
|§|, we are now interested in the stabilization of a gi ven 
periodic motion. With respect to the case of Section IV 
[a|, we're facing the dual situation, where we ignore the 
transient part of the process and we're interested in the 
shape of the period of the resulting time series. 



Let us call y = [yi, y%, . . . , j/t p ] T the desired period and 
assume that the length of the period is even, i.e. T p = 2L. 
For simplicity we consider the case that the filter H(z) is 
not present, thus the linear system G(z) is just a delay line 
£>i(z), which has to be of length L, as seen in the previous 
example. The construction of the non-linear map able to 
produce the desired periodic waveform is straightforward, 
and relies on the training set y computed using the data 
points: 



y = yiuy 2 , 



(43) 
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where 



^1 = U (yk'VL+k), 



k=l 



and 



^2 = (J (j/L + fe, 



Ilk) 



(44) 



(45) 



fe=i 



In Fig. ||, the computation of the training set from the 
desired output process is illustrated, as well as the approx- 




Time (samples) 
b) Reconstruction map (training set) 





/"i-- 9 \ \ 

\ ~ v 1 






■ \ t 

y 2 \ j 


\* i • 




/ 1 


{ \ / V /« \ 

V s e ; H 



Fig. 5 

Training data (a) and approximation of the unknown 
function / (b, dashed curve). a desired derivative of 0.3 in 
magnitude was imposed for all eight data points 



period, i.e. is x = [yi, U2, ■ ■ ■ , Vl\ T i the non-linear map 
iteratively computes the other half. The stability and ro- 
bustness with respect to additive noise is granted by the 
derivative of the map, which has to be less then one in 
magnitude. Fig. ^ shows the time evolution of the sys- 
tem whose non-linear map data point derivatives are con- 
strained to a magnitude of 0.3, and whose evolution is tem- 
porarily disturbed with additive noise, with a SNR of 46 
dB. 

One might be curious about the possibility of reaching 
a desired stable periodic motion from a quasi-zero random 
state, controlling the slope in the origin as in the previous 
example. Despite the fact that the solution appears to 
be in the combined use of the skills given in the previous 
examples, whether such control would be possible or not 
with a time-invariant 1-in 1-out non-linear map, seems to 
be a non-trivial problem. 

If the filter H(z) is not omitted in G(z), the control 
of stability of the single loop feedback system of Fig. || 
can be conveniently approached by studying the Jacobian 
matrix J of the map F which describes the state transition 
x(ri + 1) = F(x(n)) at every successive time step, x being 



rA 



rA 



rA A 



rA 



^A rA 



10 20 30 40 50 60 70 80 90 




time (samples) 

Fig. 6 

Closed loop system: rejection of additive noise, a) Time 
evolution of the system. b) distance from the target 
evolution when noise is added to the loop, from sample 21 
to sample 25 



the global state of the system. Let the linear element G(z) 



be, as before, the cascade of a delay line Dl(z) 



of 



length L and a low-pass filter H(z). We are interested in 
leading the system to a stable periodic motion. In a steady 
state situation, the state = [x%, X2, ■ ■ ■ , xl] t of the delay 
line undergoes a linear distortion due to the filtering stage. 
This is represented by the L-point circular discrete-time 
convolution 11711 



y k = (h © x)k, 1 < k < L 



(46) 



To restore the original state of the delay line the non-linear 
map / can be shaped on the base of a training set given by 
equation (fl3"|), with 



yi = [J(yi,yL+i), 



(47) 



and 



^2 = \J{yL+i,Vi)- 



(48) 



i=l 



In general, the geometric locus given by the training set 
y will not necessarily be a curve of dimension 1, and the 
map will need to be unfolded in a higher dimensional space. 
We consider here the case where a one-dimensional map is 
sufficient to our purposes. If H(z) is a first order FIR filter 
with coefficients b\ and 62, the system can be given in its 



DRIOLI AND ROCCHESSO: APPROXIMATION OF A MAP AND ITS DERIVATIVES WITH A RBF NETWORK 



7 



state space form as 








" 




f{x L ) 




h 







b 2 f(x L ) 


x(n + 1) = 


1 





x(n) + 










1 








(49) 

where x(n) = [xo(n)xi(n) ■ ■ ■ Xi(n)] T is the global state of 
the system at time n. 

The Jacobian matrix of the state transition map, evalu- 



ated 



m x = [xqXi 



J(x) = 



■x L \ 


h 




is given by 



d 

b 2 d 




(50) 



where 



A 9f_ 

dxL 



(51) 



A periodic orbit [x(n) • • ■ x(n + 2L — 1)] of period 2L is 
asymptotically stable if the Jacobian J has eigenvalues of 
magnitude less than one for each point of the periodic orbit. 
The eigenvalues of J are the roots of the polynomial z L+1 — 
b\dz — b 2 d, and are plotted in Fig. ^ for L = 3, and for 
different values of d. The lower and the upper figures refer 
to two different low-pass filters H(z). 

Eigenvalues of J: roots of 1 0zM-5dz-5d 




it can be seen that J has eigenvalues |A| < 1 if |d| < 1.667. 
Thus, in order to have a stable and noise-robust periodic 
solution, the magnitude of the derivative of the map in each 
point of the training data must not exceed 1.667. 

Usually, a perturbation to the closed loop system is mod- 
eled with random noise added to the loop at a given point. 
However, it can be of some interest to vary the parameters 
of the linear components in the loop such as, for example, 
the low-pass filter H(z). This can be useful to control the 
spectral content of the resulting time series. The stability 
of the whole system is thus investigated by applying, for a 
short time window, a perturbation to the filter coefficients 
b\ and b 2 . Fig. [| shows the reaction of the system to a 
random perturbation, with an upper bound in magnitude 
of 0.02, occurring at sample time 44 and ending at sample 
time 48. It can be seen that the perturbation will be per- 
sistent for values of \d\ higher than 1.667 (upper figure), 
and that it will be rejected for |d| < 1.667 in a time that is 
shorter the lower we choose |d| (middle and lower figures). 
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Fig. 8 

Rejection of a perturbation of the coefficients of the 
filter H(z). The perturbation b 1 + Abi, hi + A62, occurring 

AT SAMPLE TIME 44 AND ENDING AT SAMPLE TIME 48, HAD UPPER 
BOUND IN MAGNITUDE |A6j| < 0.02, i = 1,2. 
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Fig. 7 

Magnitude of the roots of the polynomial z 4 — b\dz — bid (or 

EIGENVALUES OF THE JACOBIAN MATRIX J) FOR b\ = &2 = 0.5 
(UPPER FIGURE) AND &i = 0.1, 62 = 0.5 (lower figure). 

Let us focus the attention on the case where L = 3 and 
H{z) has coefficients b\ = 0.1 and hi = 0.5. From Fig. [t] 



V. Conclusions 

The use of the Orthogonal Least Squares algorithm to 
approximate a non-linear map with arbitrary derivatives 
with radial basis function networks has been investigated. 
A modified version of the classic OLS algorithm formula- 
tion has been proposed, which uses the same orthogonal- 
ization approach for both the regressors of the map and the 
regressors of its derivatives. The usefulness of the method 
has been illustrated on application examples from the field 
of control of single loop feedback systems, and we have 
stressed the importance of derivatives of the non- linear map 
to control important features such as stability, velocity of 
transients, and rejection of disturbances. 



8 



IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART B: CYBERNETICS, VOL. XX, NO. X 



References 

[I] S. Chen, C. F. N Cowan, and P. M. Grant, "Orthogonal least 
squares learning algorithm for radial basis functions networks," 
IEEE Trans, on Neural Networks, vol. 2, no. 2, pp. 302-309, 
March 1991. 

[2] S. Haykin, Neural Networks. A Comprehensive Foundation, 

Macmillan, New York, 1994. 
[3] S. Haykin and J. Principe, "Making sense of a complex world," 

IEEE Signal Processing Mag., vol. 15, no. 3, pp. 66-81, May 

1998. 

[4] S. Chen and S. A. Billings, "Neural networks for nonlinear dy- 
namic system modelling and identification," Int. J. of Control, 
vol. 56, no. 2, pp. 319-346, 1992. 

[5] G. P. Liu, V. Kadirkamanathan, and S. A. Billings, "Vari- 
able neural networks for adaptive control of nonlinear systems," 
IEEE Trans, on Systems, Man, and Cybernetics- C, vol. 29, no. 
1, pp. 34-43, February 1999. 

[6] R. Langari, L. Wang, and J. Yen, "Radial basis function net- 
works, regression weights, and the expectation-maximization al- 
gorithm," IEEE Trans, on Systems, Man, and Cybernetics- A, 
vol. 27, no. 5, pp. 613-623, September 1997. 

[7] P. Yee and S. Haykin, "A dynamic regularized radial basis func- 
tion network for nonlinear, nonstationary time scries prediction," 
IEEE Trans, on Signal Processing, vol. 47, no. 9, pp. 2503-2521, 
September 1999. 

[8] M. Casdagli, "Nonlinear prediction of chaotic time series," Phys- 

ica D, vol. 35, pp. 335-356, 1989. 
[9] F. J. Romeiras, C. Grebogy, E. Ott, and W. P. Dayawansa, 

"Controlling chaotic dyanamical systems," Physica D, vol. 58, 

pp. 156-192, 1992. 
[10] J. T. Connor, R. D. Martin, and L. E. Atlas, "Recurrent neural 

networks and robust time series prediction," IEEE Trans, on 

Neural Networks, vol. 5, no. 2, pp. 240-254, March 1994. 

[II] K. Hornik, M. Stinchcombe, and H. White, "Universal approxi- 
mation of an unknown mapping and its derivatives using multi- 
layer feedforward networks," Neural Networks, vol. 3, pp. 551- 
560, 1990. 

[12] P. Cardaliaguet and G. Euvrard, "Approximation of a function 
and its derivatives with a neural network," Neural Networks, 
vol. 5, pp. 207-220, 1992. 

[13] M. E. Mclntyre, R. T. Schumacher, and J. Woodhouse, "On 
the oscillation of musical instruments," J. of Acoustical Soc. of 
America, vol. 74, no. 5, pp. 1325-1345, 1983. 

[14] X. Rodet, "Models of musical instruments from Chua's circuit 
with time delay," IEEE Trans, on Circuits and Systems, vol. 
40, no. 10, pp. 696-701, 1993. 

[15] C. Drioli and D. Rocchcsso, "Learning pseudo-physical models 
for sound synthesis and transformation," Proc. of IEEE Int. 
Conf. on Systems, Man, and Cybernetics, pp. 1085-1090, Octo- 
ber 1998. 

[16] L. O. Chua, "Nonlinear circuits," IEEE Trans, on Circuits and 
Systems, vol. CAS-31, no. 1, pp. 69-87, January 1984. 

[17] A. V. Oppenheim and R. W. Schafer, Discrete- Time Signal 
Processing, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1989. 



